Skip to content

Conversation

@bioball
Copy link
Member

@bioball bioball commented Aug 8, 2025

Here is the reference PR: apple/pkl#1169

@bioball bioball force-pushed the pkldoc-io-improvements branch from 34b4378 to 297e827 Compare August 8, 2025 19:47

. Separate runtime data into package-level runtime data and package-version level runtime data.
. Generate package-level runtime data by consuming the previously generated package-level runtime data.
. Eliminate known subtype and known usage information at a cross-package level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This greatly limits the usefulness of this information. To my knowledge, Javadoc, KDoc, Scaladoc and Pkl’s IntelliJ plugin all support cross-package information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pkldoc would still support cross-package information, but only "up" (who am I extending, what types am I using), just not "down" (who is using me).

As far as I know, neither Javadoc, KDoc, nor Scaladoc show information downwards.

IMO: I don't know how useful this information is, either. For commonly used enough types, this information becomes quite noisy.

Also: cross-package references like this are actually broken right now (they were intended to work, but they don't).
So, this actually isn't a regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, neither Javadoc, KDoc, nor Scaladoc show information downwards.

From what I can see they all do, at least for subtypes.

Example (see “all known subinterfaces” etc.): https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/util/Collection.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting! TIL.

But, this type of information is still only maintained some sort of local level. pkldoc is somewhat different, in that we can possibly generate docs of many, many packages. Imagine if the JDK's docs showed known subclasses of Collection for 3rd-party libs; that list would grow massively pretty quickly.

Copy link
Contributor

@odenix odenix Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this type of information is still only maintained some sort of local level.

It’s as local or global as the user running Javadoc wants it to be. A Javadoc site can document any number of Java modules. It can also include information from any number of upstream Javadoc websites, which of course can’t depend on code documented by the current Javadoc site (only the other way around). I thought Pkldoc worked in the same way.

The JDK’s Javadoc documents dozens of modules, hundreds of packages, and thousands of classes. Even though every JDK version has its own Javadoc site, a single site is probably larger than Pkl Pantry’s multi-version Pkldoc.

Javadoc also tracks usages, but on a separate page: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/util/class-use/Collection.html

If you’d like to reduce the amount of processing and information presented, one option is to track subtypes across packages and to drop usages altogether. This feels more useful to me than tracking some info across packages and other info only within a package.

Another option is to limit usage tracking to the package or package+module level.

Copy link
Member Author

@bioball bioball Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkldoc is somewhat different; in some cases, pkldoc sites serve as a central package documentation site (think https://pkg.go.dev/ or https://docs.rs). And in these cases, the site can be quite big; probably already several orders of magnitude larger than package_docs.

Copy link
Contributor

@odenix odenix Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. “Arbitrarily” tracking some info only within packages is very surprising/limiting. If there’s no way around this, perhaps it should be reflected in the title, e.g., “subtypes within package”. Alternatively, this could be an option only used for central package doc sites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose we can have a flag that determines whether known types/usages is tracked across packages or not. And if not, we can change the label to "Subtypes within package", etc.

By the way, this is actually broken today (cross-package known usage/subtype doesn't work).

The known versions data will be recorded in the package level JSON runtime data.
The other two kinds of data will be recorded in the package-version level runtime data.

The known-subtypes and known-usages relationships will only be recorded inter-package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean “intra-package”?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks!

==== Runtime data format changes

The existing runtime data files will be used to generate new runtime data files.
To improve machine readability of these files, they are changed from `.js` to `.json` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change turn these files into a public API? Is all contained information relevant/suitable for a public API? Have you considered having a separate public API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkldoc itself doesn't really care whether these should be a public API or not.
I guess whether these URLs end up being a "public API" depends on who is running the web server.

For our own package docs site, I don't think we would support this being used as an API.

BTW: one alternative that I thought about for managing metadata is to use a database (e.g. provide a jdbc connection string when running pkldoc). However, this is a much more involved change that would be quite a big rewrite of pkldoc. I can add some notes about this in the "alternatives considered".

We're also considering having a package index at some point, which would enable use-cases like: "bump my package to the latest version". This is orthogonal to pkldoc, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t (only) mean if these files are served but if a (local) consumer can rely on their presence and format (since you mentioned “improve machine readability”). I’m asking
because I’m not sure if pkldoc’s internal metadata format should double as a public format with strict compatibility guarantees etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.

No; these files are only meant for pkldoc itself to consume. It's not meant to be an API for any other use-cases, and it's possible that a future migration will change the current JSON files in a breaking way.

@bioball bioball merged commit 91db018 into apple:main Nov 3, 2025
@bioball bioball deleted the pkldoc-io-improvements branch November 3, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants